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Abstract —In this paper, based on a successively accuracy- 
increasing approximation of the £o norm, we propose a new 
algorithm for recovery of sparse vectors from underdetermined 
measurements. The approximations are realized with a certain 
class of concave functions that aggressively induce sparsity and 
their closeness to the £ 0 norm can be controlled. We prove that 
the series of the approximations asymptotically coincides with 
the l\ and £o norms when the approximation accuracy changes 
from the worst fitting to the best fitting. When measurements are 
noise-free, an optimization scheme is proposed which leads to a 
number of weighted l\ minimization programs, whereas, in the 
presence of noise, we propose two iterative thresholding methods 
that are computationally appealing. A convergence guarantee 
for the iterative thresholding method is provided, and, for a 
particular function in the class of the approximating functions, 
we derive the closed-form thresholding operator. We further 
present some theoretical analyses via the restricted isometry, null 
space, and spherical section properties. Our extensive numerical 
simulations indicate that the proposed algorithm closely follows 
the performance of the oracle estimator for a range of sparsity 
levels wider than those of the state-of-the-art algorithms. 

Index Terms —Compressed Sensing (CS), Nonconvex Opti¬ 
mization, Iterative Thresholding, the LASSO Estimator, Oracle 
Estimator. 

EDICS Category: DSP-CPSL, DSP-RECO, SAM-CSSM, 
OPT-NCVX 


I. Introduction 


sampling of analog signals 0 array signal processing 
and radar [8), to name a few. 

A general formulation for the sampling in the compressed 
fashion is 

b = Ax + w, (1) 

in which b € W is the vector of measurements, A € 
l" xm (n < m) is the sensing matrix, x £ R m is the 
unknown sparse vector with s <C m nonzero elements, and 
w is the probable additive noise. In a nutshell, given b 
and knowing A, the goal in recovery of sparse vector from 
compressed measurements is to accurately estimate x. As 0 
is underdetermined, recovery of x from b is ill-posed unless 
we know a priori that x resides in a low-dimensional space. 
Sparsity enters the game, and the sparsest solution which is 
consistent to the measurements is sought via 

min||x||o subject to ||Ax — b|j < e (2) 

X 

in which |jxj| 0 designates the so-called £ 0 norm of x defined 
as its number of nonzero entries, || • || stands for the t 2 norm, 
and e > ||w|| is some constant. 

Under certain circumstances, the solution to 0 is close 
to x in 0 0 [ 10|, and, particularly, when w = 0 (i.e., 
measurements are noise-free). 


D URING the last decade with the appearance of theoret¬ 
ical and experimental results showing the possibility of 
recovering sparse signals from undersampled measurements, 
there has been an explosion of applications gaining from the 
reduction in the sampling rate. In fact, any field of science or 
engineering, where sampling a signal is part of the processing 
task, can potentially benefit from the line of research ongoing 
in the domain of compressed sensing (CS). This can be 
witnessed with applications in sciences such as quantum state 
tomography in physics [ lj, imaging in astrophysics [2j, bacte¬ 
rial community reconstruction in biology [|3j, and genotyping 
in genetics Q. In engineering, the number of applications 
is tremendous and includes magnetic resonance imaging 
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min ||x||o subject to Ax = b 

X 

(3) 

exactly recovers x |9|. Programs ([2]i and Q are 
NP-hard [IT); however, their convex relaxations. 

generally 

min ||x||i subject to ||Ax — b|j < e 

X 

(4) 

and 


min ||x||! subject to Ax = b, 

(5) 


can be considered instead. Convexification makes the recovery 
tractable at the cost of increasing the sampling rate needed for 
stable recovery along with increase in the reconstruction error 
(9). A hot topic of research is, therefore, to propose recovery 
algorithms to push the sampling rate and reconstruction error 
as much as practically possible toward the intrinsic bounds of 
£q minimization. To this end, several nonconvex alternatives 
for ([Tj and Q have been proposed in the literature. The list for 
the noise-free recovery is already rich (cf. 1121—1161), yet, in 
the noisy recovery, there is still much room for improvement. 
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A. Contribution 

In this paper, we propose a new algorithm for recovery of 
sparse vectors. Although, as will be shown in the numerical 
simulations, the proposed algorithm outperforms some of the 
state-of-the-art algorithms in the noise-free recovery, the main 
goal of introducing this method is to provide an effective 
algorithm for the more challenging problem of recovery from 
compressed noisy measurements. The core idea of the pro¬ 
posed method is to closely approximate the £q norm with a 
certain class of concave functions and then to minimize this 
approximation subject to the constraints. However, in contrast 
to the ideas of ED and where a fixed approximation is 
used, we use a series of approximations in which the accuracy 
of the approximations improves as the algorithm proceeds. 
The proposed algorithm is, hence, called SCSA standing for 
successive concave sparsity approximation. 

In the noise-free case, our proposed algorithm involves 
solving some weighted i\ minimization programs which is 
computationally demanding. Nonetheless, in the noisy case, 
we utilize an iterative thresholding method ED to decrease the 
complexity and derive a closed-form solution for the thresh¬ 
olding operator. In addition, we theoretically characterize the 
conditions under which the proposed iterative thresholding 
method converges. 

On the theoretical side, the SCSA algorithm is supported 
by some guarantees derived from the restricted isometry [ 1 8]1, 
null space ED- and spherical section | |20) properties. On the 
numerical side, extensive empirical evaluations in the noisy 
case show that the SCSA algorithm significantly and con¬ 
sistently outperforms £\ minimization as well as some other 
nonconvex methods in terms of reconstruction error, whereas 
the execution time is at most three times longer than that of 
one of the fastest algorithms in the comparison. Furthermore, 
we show that the SCSA algorithm closely follows the oracle 
estimator ED for a broad range of sparsity levels. 


B. Connections to Previous Work 

The idea of the SCSA algorithm is borrowed from |22) 
that proposes a method for low-rank matrix recovery. In 
this work, we apply the same idea to the sparse recovery 
problem and propose an efficient optimization method for 
the noisy case which is not considered in J22) . Also, the 
underlying idea of SCSA is somehow connected to the ideas 
of the SLO algorithm |[l4j and the algorithm of | [23) . The 
fundamental difference with the SLO algorithm is as follows. 
Let er denote the parameter that reflects the accuracy of £q 
norm approximation for SCSA, SLO, and the method of |23j, 
where a smaller a corresponds to better accuracy. For SCSA, 
we use a class of subadditive functions which enables us to 
analytically prove that, under some conditions, minimizing 
the £q norm approximation for any er > 0 leads to exact or 
accurate recovery in the noiseless or noisy cases. However, 
there is not such a guarantee for SLO except for the asymptotic 
case of er —y 0. This may also justify the performance improve¬ 
ment over SLO demonstrated in our numerical simulations. 
Additionally, as the approximating functions differ, completely 


dissimilar optimization methods are exploited. In comparison 
to |23) , the main distinctions are summarized as: 

• The method of ||23] has been proposed for the noise- 
free case and has been specialized for “reconstruction of 
sparse images,” while SCSA applies to the noisy case as 
well and is designed for the general framework of CS. 

• In this work, theoretical analyses for the asymptotic cases 
of er —► oc and er —► 0 are provided. 

• Here, convergence analysis for the proposed optimization 
methods are also given. 

• |23) solves the associated optimization problem by 
smoothing the approximating functions to make it sim¬ 
pler, while SCSA introduces no smoothing. 

• Finally, in this paper, the proposed initialization is the¬ 
oretically motivated, while |23) intuitively justifies its 
different initialization. 


C. Notations and Outline 

[af| designates the smallest integer greater than or equal to 
x. sign(x) = \x\/x for x € M \ {0}, and sign(0) = 0. The 
sign(-), max(-,-) and min(-,-) functions act component-wise 
on vector inputs. For a vector x, ||x||i and ||x|| denote the 
£\ and £2 norms, respectively, ||x||o denotes the so-called £q 
norm, and Xi represents the <th element. For vectors x and y, 
x > y means that Vi, x, > y, , x©y indicates component-wise 
multiplication, and (x,y) = x T y denotes the inner product. 
X ' denotes the Moore-Penrose pseudoinverse of the matrix X. 
For symmetric matrices Y, Z, Y A Z means Y —Z is positive 
semidefinite. For a positive semidefinite matrix X, A max (X) 
and A m j„(X) denote the largest and smallest eigenvalues. 
null(A) represents the null space of the matrix A. N( 0, S) 
designates the multivariate Gaussian distribution with mean 0 
and covariance X. 

The rest of this paper is structured as follows. In Section [11] 
the main idea of the SCSA algorithm is introduced, and 
Section III explains the optimization methods used in the 
noiseless and noisy cases. In Section |TV} theoretical analyses 
are presented, and, in Section[V] the performance of the SCSA 
method is evaluated and compared against some state-of-the- 
art algorithms by numerical simulations. Section VI concludes 
the paper. 


II. Main idea of the SCSA Algorithm 


A. Motivation 


The problem of finding the sparsest solution of Ax = b 
or the sparsest vector in the set {x | ||Ax — b|| < e} can be 
interpreted as the task of approximating the Rronecker delta 
function (called herein the delta function for brevity). Let 


5{x) 


1 if x = 0 
0 otherwise 


denote the delta function, then the £$ norm of a vector x is 
equal to ||x|| 0 = ^™-Jl — <5(xj)] and can be approximated 
by YhL 1 f( x i ) ' n which / : R —» R. acts as a delta 
approximating (DA) function^ Replacing the £q norm with 


'Clearly, fix) is approximating 1 — S(x), not the delta function. Never¬ 
theless, we call it DA function for the sake of easy referral. 
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Fig. 1. Some algorithms for recovery of sparse vectors implicitly or explicitly 
use different functions to approximate the delta function as ||x||o equals 
yT-, 1 — S(xi). Some of them are plotted in this figure, and among them, 
fa(x) = 1 — e~ \ x \! a more closely matches 1 — 6(x) for a small a. 


the above approximation, the next step is to find a point in the 
feasible set that minimizes Y1T=\ f ( Xi )• To have a numerically 
tractable optimization problem, one needs to find suitable DA 
functions with some appropriate properties like convexity or 
continuity. Based on this formulation, f{x) = |x| promotes 
sparsity with the l-\ norm, and f(x) = |x| p ,0 < p < 1, 
gives rise to i p quasi-norm minimization. Another well-known 
example is log(x) which leads to reweighted l\ minimization 

GU- 


Though t\ minimization enjoys convexity, a number of 
theoretical and numerical analyses show the superiority of 
other DA choices. For instance, 124] and [25 ] show that, at 
least for some p’s in (0,1) and under smaller thresholds on the 
restricted isometry constant (RIC) & globally minimizing 
the t p quasi-norm subject to the linear equations recovers 
sparse vectors uniquely. Also, [26| , [27] prove that, in noisy 
CS and under some mild conditions, even locally minimizing 
the i p quasi-norm via a certain optimization scheme leads to 
recovery of a solution with less error. On the numerical side, 
(13), (24), (28) give some numerical evaluations demonstrating 
the superiority of i p quasi-norm minimization. Moreover, 
G3> ( 29 ), (3D) analyze and compare the theoretical and/or 
numerical performance of reweighted £ 1 minimization to l\ 
minimization showing considerable improvements. 

With this background, one may think whether better ap¬ 
proximations of the delta function lead to higher performance 
in recovery of sparse vectors. In this paper, we use a class 
of DA functions which more closely approximate the delta 
function and show that this intuition is indeed the case. Fig. [T] 
shows the above DA functions as well as 1 — exp(— |x|/tr), 
one of the exploited DA functions in this paper in which a is a 
parameter to control the fitting to the delta function. Obviously, 
by choosing a small enough a, it is possible to have the 
best fit to the delta function among the plotted DA functions. 
Putting this DA function in the sparse recovery problem, we 
are proposing to solve 


min 

X 


J2 (i - ex p(-N/o-)) 

i=1 


s.t. || Ax — b|| < e, (6) 


where e is equal to 0 in the noise-free case, to obtain a solution 
to 0 or 0. 


B. Properties of DA Functions 

To establish theoretical analysis for the proposed algorithm 
and derive efficient optimization methods, we need to impose 
some assumptions on the DA functions, summarized in the 
following property. 

Property 1: Let / : R —► [— 00 , 00 ) and define f a (x) = 
f(x/a) for any 0 > 0. The function / is said to possess 
Property [T] if 

(a) / is real analytic on (xq, 00 ) for some Xq < 0, 

(b) Vx > 0, f"(x) > —c, where c > 0 is some constant, 

(c) / is concave on R, 

(d) f{x) = 0«t = 0, 

(e) lim x _> +00 /(x) = 1 . 

It follows immediately from Property 0 that {/-(M)} 
converges pointwise to 1 — S(x) as 0 —> 0 + ; i.e.. 


lim fa(\x\) 

< T ->0 + 


0 if x = 0 
1 otherwise. 


Besides the DA function f(x) = 1 — e~ x which is mainly 
used in this paper, there are other functions that satisfy 
conditions of Property [T| including 


/(z) 



if X > Xq 

otherwise 


for some — 1 < Xq < 0. 


C. The Main Idea 

To obtain higher performance in recovery of sparse vectors, 
we propose to closely approximate the £q norm and then 
solve the consequent optimization problem. More precisely, let 
/(•) denote a function possessing Property [I] then we define 

-MM) = fa(\xi\) « ||x||o and use 

m 

min (Fo-dxl) = ^ /<r(|xi|)) subject to Ax = b (7) 

i=l 

to find a solution to (3]i and 

minFo-dxl) subject to ||Ax — b|| < e (8) 

X 

to obtain a solution to <0- Expectedly, these optimization 
problems are not convex, and any algorithm may get stuck 
in a local minimum. 

Intuitively, when a is small and F CT (|x|) approximates ||x|| 0 
with a good accuracy, there are many local solutions which 
makes the task of optimizing (7) and (8) very hard. In contrast, 
if a is relatively large, while the accuracy of the approximation 
is not good, fv(|x|) has a smaller number of local minima. 
In line with this, in the asymptotic case, it will be shown that 
<jF a (|x|) becomes convex as a tends to infinity. 

Following the same approach as in (74) , (23), (3T), a 
continuation scheme for solving 0 and (8]i is utilized which 
helps in achieving the sparsest solution of Ax = b and 
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{x | || Ax - b|| < e} instead of finding a local minimum. 
Initially, optimization of 0 or 0 is started with a very large 
value of (j, and the solution is passed as an initial guess to 
the next iteration in which 0 or is solved for a smaller 
value of a. These iterations continue until reaching a desired 
accuracy. 

To further decrease the chance of getting trapped in local 
minima, we constrain f a (-) to be continuous with respect to 
(7 in Property [T] In this fashion, we expect that when u’s at 
two consecutive iterations are close, the global minimizers for 
these two er’s are in the vicinity of each other. Thus, starting 
from a convex optimization and gradually decreasing <r, it is 
more likely that a global solution will be found. 

III. Implementation of the SCSA algorithm 

Contrary to [23) that tries to solve 0 by converting it to an 
unconstrained problem and smoothing the DA function around 
the origin, we solve 0 directly by employing a majorize- 
minimize (MM) technique |32) without smoothing the DA 
function. This approach leads to iteratively solving a few 
weighted t\ minimization (or linear) programs. Inspired by 
the iterative thresholding (IT) technique, we also propose 
two efficient methods for the noisy case of <[8]> which are 
computationally attractive. 

A. Optimization for a Fixed a in the Noise-Free Case 

Although F a (\yi\) is not a differentiable function, by re¬ 
stricting x to be in the positive orthant, we can drop | • | 
from its argument, and make it differentiable. In other words, 
if we look for the sparsest solution which belongs to the 
positive orthant, then we can use f a (x) = 1 — exp(— x/a) 
with the constraint x > 0 in 0 and, in this way, make the 
cost function differentiable. Nevertheless, exploiting the same 
technique as in the conversion of 0 to a linear program 
[llj, we can overcome this restriction. To be precise, let 
y = [xj x^J denote a column vector of length 2 m in 
which x p = max(x, 0) and x m = — min(x, 0). The elements 
of y are nonnegative, F a ( y) = F^dxl), and the constraints 
Ax = b are converted to [A, — A]y = b. By introducing this 
new vector, 0 can be reformulated as 

minify) subject to [A, — A]y = b, y > 0. (9) 

y 

The following theorem proves that, by solving ([9]), we are able 
to optimize program 0. 

Theorem 1: For any class of functions possessing Property 
[0 programs 0 and 0 are equivalent. 

Proof: The proof follows the same lines as in the proof 
of equivalence of A -minimization to a linear program CD- 
Let y* = [u T v T ] , where u,v £ R m , denote an optimal 
solution to 0. To show that 0 and 0 are equivalent, it is 
sufficient to show that the definition of y in 0 is not violated, 
or, mathematically speaking, the supports of u and v do not 
overlap. Assume, to the contrary, that they overlap at index 
k. Without loss of generality, further assume that u k > v k , 
then there is another solution y = [u T v T ] with u = u and 
v = v except for u k = u k — Vk and v k = 0. This new solution 


is feasible since u > 0, v > 0, and [A, —A]y = b. However, 
F a (y) is smaller than F a ( y*) by f a (u k ) + fAv k ) - f a {u k - 
Vk) > 0 which contradicts the optimality of y*. ■ 

Since 0 is a concave program, the MM technique can 
be easily used to find at least a local solution. First-order 
concavity condition for F a {-) implies that 

FAy) < FA y) + <y - y, VF ff (y)> = tf ff (y,y) 

for some y in the feasible set. To apply the MM technique, 
H„ (■, •) is selected as a surrogate function. Consequently, 
neglecting the fixed terms, to obtain a solution to 0, one 
needs to iteratively solve 

yfc+i = argmin { (VF ff ( y fe ), y) | [A, -A]y = b, y > oj 
y 1 J 

(10) 

for k > 0 until convergence. It can be verified that the program 
© is equal to a weighted A minimization 

Xfc+i = argmin {|| Wx||i | Ax = bj, (11) 

where W = diag(VF CT (z)) and z = |xfc|. 

The next proposition, which can be easily deduced from 
|22j . Theorem 3], proves the convergence of the proposed MM 
based approach. 

Proposition 1: The sequence {y/, : } obtained from ( fT0| ) is 
convergent to a local minimum of 0 and F a (y k+1 ) < 
FAy k ) for all k > 0. 

B. Optimization for a Fixed a in the Noisy Case 

A computationally attractive way to find a solution to 
an unconstrained optimization problem, in which the cost 
function is composed of the sum of a smooth and a nonsmooth 
convex function, is to use iterative thresholding or proximal 
algorithms; see 07) for a comprehensive discussion. This kind 
of problems is represented as 

min Ap(x) + L(x), (12) 

X 

where p(x) is convex but possibly nonsmooth, //(x) is convex 
and differentiable with Lipschitz continuous gradient, and A 
is a regularization parameter. Particularly, program 0 can be 
converted to an equivalent unconstrained optimization prob¬ 
lem, known also as the LASSO program |33|. 

minA||x||i + II Ax — b|| 2 , (13) 

X 

where A > 0 is a constant to regularize between solution 
sparsity and consistency to measurements. Accordingly, 0 
can be solved by applying iterative thresholding method on 
O- This special case of proximal methods is known as the 
iterative soft thresholding (1ST) method. More generally, <0 
can be written as 

min A||x||i + 7i(x), (14) 

X 

where 7i(x) is as in ( | 1 2[ ). The 1ST method is aimed for finding 
a solution to ( fl4) by iteratively solving [1 34) 

x fe+ i = argmin |(x-x fc ,Vh(x fc )) + ^-||x-x fc || 2 -FA||x||i|, 
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where p, is a step-size parameter. The above program admits 
a unique closed-form solution given by 

x fc+ i = S Xll (xk - /uV/i(x fe )), (15) 

where S a : R —> R. is the soft thresholding operator defined 
as S a (x) = max(|x| — a, 0) sign(x) for a scalar input and is 
applied component-wise to vectors. 

To utilize the IT method for finding a solution to (|8j, first, 
program ([8]) is formulated as an unconstrained optimization 
problem 

minAo-F^dxl) + h(x), (16) 

X 

where h(x) is as in ( |1 2\ and \„ is a regularization parameter 
similar to © and, in general, may be a function of a. 

Similar to the convex case, © can be optimized by 
iteratively solving 


argmin j(x - x fe , V/i(x fc )) + -E[|x - Xfclp + A^FVdxl)} 

(17) 


until convergence. Ignoring constant terms, it can be verified 
that the program (fl7]> is equal to 


x fe+ i = argmin j — ||x — (x fc -/zV/i(xfc))|| 2 + A^dxl)}. 

(18) 


For a general F a (-), one should use iterative methods to solve 
©■ However, remarkably, for f a (\x\) = 1 — exp(— |a;|/cr), 
we can hnd a closed-from solution using the Lambert W 
function [35] which enables us to efficiently solve ( p~8| ). The 
details of derivation of the closed-form solution as well as the 
corresponding thresholding operator are given in Appendix [A] 
Using this thresholding operator, for any fixed a, our IT based 
approach for solving © simplifies to iteratively updating 
Xfc+i by 

x/c+t = (xfc - /iV/i(xfc)), (19) 

where is given in ( [25] ) in Appendix [a] 

The next theorem whose proof is left for Appendix IB] 
analyzes the convergence of the sequence generated by ( |17| ) 
or ( fl9| . 


Theorem 2: Let M > 0 denote the smallest Lipschitz 
constant of Vh, and let M' be the smallest value such that 
A 0 .V 2 F 0 .(x) y -M’ I,Vx > Op] For any n G (0, the 


“ ^ \ — j — M ^ r V 

sequence {xfc} generated by ( |T7| ) is convergent to a stationary 
point of ( fl~6] ). 

Although the IT technique has a low complexity, it lacks fast 
convergence rate 1341. To speed up the rate of convergence, a 


Nesterov’s like acceleration step is added in 1341 to the 1ST 
method which considerably increases the rate of convergence. 
While the complexity does not boost, improvement in the 
convergence rate is considerable. To accelerate our IT based 
approach, without proving the convergence, we also exploit a 
similar technique based on the FISTA algorithm ]34). We call 


2 For h(x) = ||Ax - b|| 2 and F a (x) = J2iLi 1 “ e~ x *F, M = 
2\max(A- T A) and M' = \ a /cr 2 . 


this instance of the SCSA algorithm fast iterative thresholding 
(FIT) based. 

Now, let us remark on how the regularization parameter A CT 
should be scaled with a. For ( |T6] > with h(x) = ||Ax — b|j 2 , a 
necessary condition for optimality of a point x* is that 

OG2A T Ax*-2A T b + A (T <9F CT (|x*|) (20) 


where <9F CT (|x*|) denotes the Clarke subdifferential [361 of 
F a (| • |) at point x*. Particularly, in the scalar case, df a (\x\) 
for f a (x) = 1 — e~ x !° is given by 


t(M) = < l cr 


rsignfa") _w - 
e a 


if x / 0 
if x = 0. 


J-l/cr, 1/cr] 

Let r, x*, and A r denote the support set of x* and restriction 
of x* and A to the entries and columns indexed by r, 
respectively. Condition (|20|) implies that 


= A t r b-^(A^A 7 


) -1 sign(x*) 0 e 


l*? I 


l a f (A r x* - b)| < 


K 

2a 7 


( 21 ) 


( 22 ) 


where a, : denotes the vth column of A. The second term in 
the right-hand side of ( |2Tj ) disappears when a -+ 0, since the 
exponential terms decay faster than a. Therefore, if r coincides 
with the support set of the true solution, © shows that x* 
tends to the oracle solution, which is obtained by knowing the 
support set of the true solution. However, when a is decreased 
to smaller values, inequality ( [22] ) will be satisfied easier, and 
( fi~6| will give solutions with smaller sparsity levels. In fact, 
\af (A r x*— b)| measures how close are the residual A r x* b 
and the ith column of A; hence, the larger the threshold, the 
larger the number of indices of columns of A to be excluded 
from r. Moreover, the threshold \ a /a should naturally be 
independent of a but dependent on the noise level. Thus, to 
have this threshold independent of a, \ a is scaled linearly 
with cr; i.e., X a = Act for some constant A. 

A closer look at the thresholding operator introduced in 
( [25] ) reveals an interesting interpolation property. Formally, in 
Propositions [2] and [4] we will show how 0 asymptotically 
converges to 1+ and £i minimization when a goes to 0 or oo. 
The thresholding operator 77 (•)> however, also more simply 

illustrates this asymptotical behavior for 1 — exp(— |cc|/cr). 
Figure El displays T x °\-) for cr = 100,1, and 0.1 when 
\ a = Act and A is fixed to 1. In this plot, when a is 
relatively large, (■) is very close to the soft thresholding 
operator, whereas, for a small cr, it is very similar to the hard 
thresholding operator 1371, which according to the formulation 
used in this paper is defined as 


Hx(x) = \x |1| X | >V ^, (23) 

where 1 denotes the indicator function. This shows that when 
cr is swept from very large to smaller values, the thresholding 
operator gradually converts from the soft thresholding operator 
to the hard thresholding operator, making an interpolation 
between ti and £q minimization. 

Finally, it is worth comparing our above proposed approach 
to a few other available methods. In [26] and [38), a multi¬ 
stage convex relaxation method for solving © has been 
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xq 

Fig. 2. 'T\ 7 ^ is plotted for a = 100,1, and 0.1 with A = 1 and A G = Act. 

a / \ 

For the large a, 7y is very close to the soft thresholing operator S\ with 
A = 1, whereas it is very similar to the hard thresholding operator defined in 
(23} with A = 0.1. 


proposed, where the nonconvex func tion F(-) is more general 
than those defined by Property 1 3 When F(-) is aimed for 


sparsity regularization, the proposed optimization method in¬ 
volves solving a number of weighted versions of ( [13} . Namely, 
one needs to iteratively solve 


x fc+ i = argmin|A||W fe x||i + ||Ax - b|| 2 |, 

where /i(x) is assumed to be equal to ||Ax — b|| 2 and W* *, 
is the weighting matrix that depends on x/ ;: (the previous 
solution) and the function F(-) ]26| , [381. In contrast to this 
approach and instead of solving a sequence of optimization 
problems to obtain a solution to ( p~6] >, our proposed method 
directly solves ( [ 16 } which in turn seems to be more compu¬ 
tationally efficient. Moreover, in |39) , a regularization path¬ 
following algorithm for solving ( [ 16 } is introduced which is 
based on the iterative thresholding approach. Along with this 
algorithm, some strong theoretical guarantees are provided for 
a special class of nonconvex regularizations. This algorithm as 
well as its theoretical analyses, however, cannot be applied 
to the DA functions considered herein since, for instance, 
f a (\x\) = 1 — exp(— \x\/a) does not satisfy regularity con¬ 
dition (a) in | [39| . 


Subalgorithm 1 Opt FIT: Optimization of ( [16} for a fixed cr 
using the FISTA like acceleration. 

Input: A, b, A, p, a, x 0 , tol 
Initialization: 

1: yi = x 0 , *1 = 1, k = 0, h(x.) = || Ax - b|| 2 . 

Body: 

1: while d > tol do 
2: k = k+ 1. 

3: x fc a 7ff (yfc - fj,Vh (y k )). 

4: tk+i — (1 + \/l + 4tfc)/2. 

5: yfc+i = x fc + (tk - l)(x fc - Xfc_i)/tfc+i. 

6: d = ||x fc - x fc _i||/||x fc _i||. 

7: end while 

Output: Xfc 


Algorithm 1 The SCSA algorithm 

Input: A, b, A (Only for IT and FIT based instances) 

Initialization: 

1: c G (0, 0.5): decreasing factor for a. 

2: ei, £2 > 0: stopping thresholds for main and internal loops. 

3: h(x) = || Ax - b|| 2 , f a (x) = 1 - e- W/,T . 

{ LP based: xo = argmin x {||x||i|Ax = b}. 

IT or FIT based: xo = argmin x {A||x||i + ||Ax — b|| 2 }. 

5: (Jo = 8 max(|xo|). 

Body: 

1: i = 0, a = (Jo- 
2: while di > ei do 

3: xo = x,, j = 0, i = i + 1, A ct = Act. 

4: while d 2 > £2 do 

5: j=j + 1- 

6: W = diag(VF cr (|xy_i[)). 

{ LP based: xy = argmin x {||Wx||i | Ax = b}. 

IT based: xy = (xy-i - /iVh(xy-i)). 

FIT based: xy = Opt_FIT(A, b, A CT , n, a, xy_i, £ 2 ). 


8: d 2 = ||xy -xy_i||/|[xy_i| 

9: end while 

10: X, = Xy. 

11: di = ||x, — Xi_i || / |[xi_ 1 1|. 

12 : a = ca. 

13: end while 
Output: x, 


C. Initialization 

The following proposition proves that when a —> 00 , a 
scaled version of Fo.(|x|) becomes equal to the Ii norm. Con¬ 
sequently, the MM based instance of the proposed algorithm 
is initialized with a minimum fy-norm solution of Ax = b, 
and the IT and FIT based instances are initialized with the 
solution obtained by the FISTA algorithm. The proof easily 
follows from [22] Theorem 1], 

Proposition 2: For any class of functions {f a } possessing 
Property [T] 

lim -F ct (|x|) = ||x||y, 

(7 — yoo '■y 

where 7 = /'( 0 ) ^ 0 . 

3 In the formulation of |26| , F(-) also does not depend on any scaling 
parameter like a. 


D. The Final Algorithm 

Putting all the above steps together, the final algorithm, 
summarized in Algorithm |T| is obtained by exploiting f a {x) = 
1 — e~ x / a and /i(x) = || Ax — b|| 2 . According to the method 
used in optimizing ([7} or for a fixed cr, three instances 
of the SCSA algorithm are summarized as 

• SCSA-LP (based on linear programming), 

• SCSA-IT (based on IT method), 

• SCSA-FIT (based on FIT method). 

To completely characterize the implementation of this algo¬ 
rithm, the following remarks are in order. 

Remark 1. Parameter cr is decayed by a multiplicative factor 
c which should be chosen in the interval (0,0.5). We will 
discuss how to properly choose it in Section [V] with more 
details. Furthermore, following the same reasoning as in |22) , 
(Tq is set to 8 max(|xo|) because this oq virtually acts as if cr 
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tends to oo. 

Remark 2. d\ = ||x, — Xj_i||/ll x i-r|l and d 2 = ||xj — 
Xj_i||/||xj_i|| measure relative distances between the solu¬ 
tion of successive iterations of the external and internal loops, 
respectively, and are used to stop execution of these loops. In 
the proposed continuation approach, it is not necessary to run 
the internal loop until convergence, and it is just needed to 
get close to the minimizer for the current value of a. Conse¬ 
quently, ei is usually set a few orders of magnitude smaller 
than e 2 . For the noise-free case, = ICC 3 and e 2 = 10~ 2 
are suggested. In the noisy case, since the IT and FIT based 
approaches are relatively slow and the difference between 
two consecutive solutions is not large, (\ and e 2 should be 
chosen smaller. Moreover, as SCSA-IT generally has a slower 
convergence rate, e 2 (the threshold for the internal loop) for 
this instance of SCSA should be smaller than that of SCSA- 
FIT. This choice, as will be shown in numerical simulations, 
leads to a similar performance in terms of reconstruction 
accuracy. e \ and e 2 are also functions of the regularization 
parameter A. Putting altogether, we numerically found that 
a good choice for SCSA-IT is = min(10 -4 ,10~ 3 A) 
and e 2 = min(l(r 4 ,10“ 3 A) and for SCSA-FIT is ei = 
min(10" 4 ,10~ 3 A) and e 2 = min(10- 3 ,10" 2 A). 

Remark 3. Proposition [2] can be strengthened to 

lim argmin {.F 0 .(|x|)|Ax = b} = argmin { ||x||i|Ax = b) 

CT-S-OO x X 

provided that the above £i minimization has a unique solution. 
Nevertheless, since not a strictly convex program, it may occur 
that i\ minimization does not admit a unique solution. Let So 
and S i denote the solution sets of ([3]) and 0, respectively. 
An interesting problem is to characterize the conditions under 
which Sq (which we assume to be singleton) is a subset of 
Si. Given these conditions, one can hope to devise a suitable 
optimization algorithm which is theoretically guaranteed to 
start from a point in Si and end up in the unique solution 
of (3). In this fashion, the guaranteed recovery bounds for £j 
minimization can be improved. 

Remark 4. As discussed earlier, X a is set to Act for some 
A > 0. This choice can be also justified as follows. Since 
SCSA in the noisy case is initialized with a solution to €3 
and limcr-j.oo uFo-dxl) = ||x||i, it is natural to set X a = Xa 
to have the same cost function in (fl3| and (f]~6) for a —> oo. 


IV. Theoretical Analysis 


A thorough performance analysis of the SCSA algorithm 
considering all of its steps seems to be very hard and cannot be 
embedded in this paper. In this section, however, by analyzing 
programs 0 and 0 for any a > 0 and/or a tending to 0 + , 
we provide simplified analyses which will give the reader a 
theoretical insight about how the main idea works. These 
analyses are simply extracted from the results in (19). (22), 
©, ED and are based on the null-space (T9), restricted 
isometry (18), and spherical section properties (20), (42) of 
the sensing matrix. We recall or modify them in order to be 
able to study the performance of the proposed algorithm. 

Null-space based recovery conditions: It can be verified that 
Property [T| implies the ‘sparseness measure’ definition in [ (T9) . 


Consequently, based on Theorems 2 and 3 and Lemma 4 of 
COD’ a necessary and sufficient condition for exact recovery 
of sparse vectors via (7) is as follows. Let us define 


8 f (s A) A sup 

henull(A)\{0} 2^i=lJcr\ n i) 


where h\ denotes the ith largest (in magnitude) component of 
h. 0f a (s, A) < 1/2 is a necessary and sufficient condition 
for exact recovery of all vectors with sparsity at most s. 
These conditions are weaker than those corresponding to l\ 
minimization and lead to the following proposition which is a 
special case of GU Proposition 5]. 

Proposition 3: Let m/ (A), m| o (A), and m^ i ( A) denote 
the maximum sparsity such that all vectors x with ||x||o < 
m*^{ A), ||x|| 0 < ml (A), and j|x|| 0 < m* ti { A) can be 
uniquely recovered by 0 , (3) , and (5), respectively. For any 
/ct(’) possessing Property fl] 

"4(A) < "4(A) < m| 0 (A). 


The so-called robust recovery condition is satisfied if all 
vectors with sparsity at most s can be recovered from (8) with 
an error proportional to e > ||w|| (4l) . In general, extension 
of the above necessary and sufficient conditions to robust 
recovery of sparse vectors from noisy measurement is not 
easy. However, |41|| proves that, under some mild assumptions, 
the sets of sensing matrices satisfying the exact and robust 
recovery conditions differ by a set of measure zero. In other 
words, 0/^(s, A) < 1/2 also guarantees the accurate recovery 
of sparse vectors via (8) for most sensing matrices. 

Restricted isometry property based conditions: Let <5 2s and 
S 3s denote the restricted isometry constants of orders 2s and 
3s defined in 1181. For a general concave function f a (■ ) (in¬ 
cluding those satisfying Property[T), (40) shows that <i 2iS <1/2 
and S 3s <2/3 are sufficient for accurate recovery of a sparse 
vector with the fo-norm not greater than s via 0- 

Spherical section property based conditions: While the 
above recovery conditions do not provide strict superiority to 
l\ minimization, the following proposition shows that, in the 
noise-free case, one can obtain a solution arbitrarily close to 
the unique solution of / 0 minimization by properly choosing 
a. Indeed, as long as l ! o minimization admits a unique solution, 
it is possible to recover it by 0 letting cr —> 0. However, it is 
obvious that, for sufficiently sparse vectors, df tr (s, A) < 1/2 
guarantees exact recovery for any a > 0. To state the result, 
first, the definition of the spherical section property is recalled. 

Definition 1 (Spherical Section Property /|20) , l[42\j ): The 
sensing matrix A possesses the A-spherical section property 
if, for all w S null(A) \ {0}, ||w||f/||w|| 2 > A(A). In 


other words, the spherical section constant of the matrix 
defined as 


A(A) = min 

w£null(A)\{0} 11W| 


2 ' 


A is 


It is known that many randomly generated sensing matrices 
possess the SSP with high probability (42) . The proof of the 
following proposition easily follows from Proposition 4 of | [22 ) 
by restricting the matrices to be diagonal. 





Proposition 4: Assume that A £ K nxm has the A- 
spherical property, and consider a class of functions {/ CT } 
possessing Property [T| Let x CT denote a solution to ([TJ. and 
let Xq denote the unique solution to ([3]). Then 


|x<r - x 0 || < 


na a 




where a a = \f a 1 (1 — ^)|. The above inequality also leads to 


lim Xg. = x 0 . 

G— >-0+ 


V. Numerical Experiments 

To assess the effectiveness of the SCSA algorithm in 
recovering sparse vectors, a number of numerical experiments 
are performed. Initially, the effect of parameter c is examined, 
and a suitable choice for this parameter is proposed. Next, 
the performance of SCSA in noiseless and noisy settings 
is evaluated and compared to some of the state-of-the-art 
algorithms. The general experimental setups as well as specific 
settings for the noise-free and noisy cases are described in the 
following subsection. 


A. Experimental Setups 

We use randomly generated sparse vectors and sensing 
matrices in all numerical experiments. More specifically, fol¬ 
lowing common practice, each entry of the sensing matrix A 
is generated independently from the zero-mean, unit-variance 
Gaussian distribution N( 0, 1), and the columns of A are 
normalized to have unit £ 2 -norm. To construct a sparse vector 
x with ||x||o = s, first, the location of nonzero components 
is sampled uniformly at random among all possible subsets 
of {1, • • ■ , to} with cardinality s. Then the values of nonzero 
components are drawn independently from either N( 0,1) or 
the Rademacher distribution of {±1} with equal probability. 
Moreover, when measurements are noisy, the noise vector 
w is always drawn from 7V(0, cr^I). Finally, the vector of 
measurements b is equal to Ax + w, where w = 0 when the 
noise-free case is under consideration. 

Let x denote the output of one of the algorithms used in 
the numerical experiments to recover the sparse vector x from 
either noisy or noiseless measurements. We use the following 
four quantities to measure and compare reconstruction accu¬ 
racy in different experiments. 

• Reconstruction SNR in dB: 

SNR rec A 20log 10 (||x||/||x — x||) which will be used in 
Experiment 1 and, implicitly, in Experiment 2. 

• Median reconstruction SNR: 

MSNR rec — 101og 10 (||x|| 2 /median(||x — x|| 2 )) where 
median(||x — x|| 2 ) denotes the median of ||x — x|| 2 
over all the Monte-Carlo simulations. MSNR rec is used 
in Experiments 3 and 4. 

• Support recovery rate (SRR): 

Let t and t denote the support set of x and the set of 
indices of the s largest (in magnitude) components of x, 
respectively. SRR is defined as the number of realizations 
in which t — f normalized by the total number of Monte- 
Carlo simulations. SRR is used in Experiment 4. 


• Mean-squared error (MSE) which is the sample mean of 
||x — x|| 2 and will be used in Experiment 5. 

Besides the accuracy, execution time, as a rough measure of 
the computational complexity, is used to compare algorithms. 
All simulations are performed in MATLAB 8 environment 
using an Intel Core i7-4600U, 2.1 GHz processor with 8 GB 
of RAM, under Microsoft Windows 7 operating system. 

1) noise-free case: An algorithm is declared to be success¬ 
ful in recovering the solution, if SNR lec > 60 dB. Conse¬ 
quently, to compare the performance of different algorithms, 
we use the success rate debited as the number of times an 
algorithm successfully recovers the solution divided by the 
total number of trials. To solve Q and ( [TTj ), where the latter 
is also used in the implementation of some of the algorithms in 
the comparison with an algorithm-dependent weighting matrix, 
we use the i \-magic |43|. 

2) noisy case: We assume that the noise variance, cr 2 , is 
known; thus, it is possible to use the following formula 

0 

A = 2 6 ,^*- 1 (l--^) (24) 

ZTO 


to choose the regularization parameter for the LASSO es¬ 
timator •d- With the above choice, in which c r > 1 is 
some constant and $ is the cumulative density function of 
N( 0,1), the LASSO estimator achieves the so-called ‘near¬ 
oracle’ performance with probability at least 1 — a r j44) . [}] 
In our numerical experiments, c r and a r are set to 1.05 and 
0.5, respectively. The above regularization parameter is used 
for 1ST and FISTA based implementations of the LASSO as 
well as IT and FIT instances of the SCSA algorithm. 

Finally, it should be mentioned that, to have more stable 
plots without large huctuations, in the noisy case, we normal¬ 
ize the i -2 norm of each s-sparse vector to y/s. 


B. Effect of Parameter c 

Experiment 1. To increase the accuracy of approximating 
the /:'() norm in SCSA, a is decreased according to the rule 
cri = coi- 1 , i > 1. Intuitively, a small value for c corresponds 
to fast decay of o and increase in the risk of getting trapped 
in local solutions. In contrast, a relatively large value of c 
leads to smooth changes in F„f) where it is more likely to 
end up in the global solution. In the simplest case, where the 
sparsity of the solution is small enough and measurements 
are noise-free, the sufficient condition stated in Section [IV] 
implies that ([5} and 0 have the same unique solution. As 
SCSA is initialized with the minimum I^-norm solution and 
Proposition |T| proves that, for any cr, F a f), in the internal 
loop, is nonincreasing, the SCSA algorithm converges after 2 
iterations independent of c. Moreover, in the noisy case, from 
the theoretical analysis presented in Section [TV] and the proof 
of Theorem [2] which shows that A 0 -F cr (|xi|) + /i(x^) is not 
increasing in i, we expect a similar behavior. On the other 
hand, when the sparsity level increases, to decrease the risk of 
getting trapped in local minima, a larger c should be selected. 

4 As will be explained later in Subsection |V-P| with this choice of a r 
which is the same as in [45| , we are evaluating the typical performance of 
the LASSO, SCSA-IT, and SCSA-FIT algorithms. 
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Fig. 3. Averaged SNR re c of the SCSA algorithm in solving noiseless 
and noisy problems are plotted versus c for 3 different sparsity levels. The 
sparse vector is of length 400, and the sensing matrix is 200 X 400. To 
have an accurate estimate of the SNR re c, in each problem, 100 Monte-Carlo 
simulations are run, and results are averaged. 

To see the above intuition, the effect of parameter c in the 
reconstruction SNR in noisy and noiseless cases is numerically 
experimented. The dimensions of the sensing matrix are fixed 
to 200 x 400, a w is equal to 1CT 3 , and the sparse vectors are 
Gaussian distributed. The experiment is repeated for 3 different 
cardinalities (s = 5,100,105), and SNR rec is averaged over 
100 trials. Fig. [3]shows the averaged SNR rec ’s as a function of 
c. As predicted, when s is small, the SNR rec is always high. On 
the other hand, for high sparsity, after passing a critical value, 
SNR rec remains almost unchanged. Based on this observation, 
in the remaining experiments, we conservatively set c to 0.1. 

C. Noise-Free Recovery 

Experiment 2. In this experiment, the performance of the 
SCSA algorithm in the noise-free setting is compared to t\ 
minimization, the SL0 algorithm (l4| , ( p quasi-norm mini¬ 
mization & and the reweighted minimization GD 
in terms of success rate and execution time. The following 
implementation method and parameters are used for each 
algorithm. 

• When implementing the SCSA algorithm, e\ and &> are 
set to 10” 3 and ICC 2 , respectively, and c is set to 0.1. 

• For SL0 (MATLAB code: http://ee.sharif.edu/~SLzero/), 
the following parameters are used: sigma_min=10 -4 , 
c=0. 8, mu=2, and L=8. As suggested in (~j~4) , these 
parameters result in a much better success rate than that 
provided by the default values. 

• tp quasi-norm minimization is implemented based on the 
iteratively reweighted t\ minimization approach in (46) 
with p = 0.5. 

• For the reweighted l\ minimization (12), we use e = 0.1 
which attains the best performance Q2|. 

The dimensions of the sensing matrix are fixed to 250 x 
500, the sparsity of x is changed from 70 to 170, and success 
rate and average execution time over 500 trials are plotted in 




Fig. 4. Comparison of the SCSA algorithm in solving the noise-free problem 
to t\ minimization, the SL0 algorithm |14| , i p quasi-norm minimization 
(BJ (p = 0.5), and the reweighted t\ minimization |12|| in terms of success 
rate and execution time. The dimensions of the sensing matrix are 250 x 500. 
Trials are repeated 500 times, and results are averaged over them. 

Fig. 0 As depicted in this figure, SCSA has the best success 
rate amongst the algorithms, whereas its computational load is 
much higher than the closest competitor. However, as we show 
shortly, in the noisy setting which is more realistic, SCSA 
maintains the superiority with a quite reasonable complexity. 

D. Recovery from Noisy Measurements 

In the following experiments, superiority of the proposed 
algorithm in the noisy setting is demonstrated. Toward this 
end, the SCSA algorithm is compared with the oracle estimator 
HD, which knows the location of the nonzero elements of the 
true solution, LASSO (or BPDN), the SCAD penalty (47) , 
Robust SL0 (48) , the method of (49) , (50) , and iterative log 
thresholding (ILT) (51). The description and implementation 
details of these algorithm are as follows. 

To implement the LASSO estimator, 1ST and/or FISTA 
methods are used. The smoothly clipped absolute deviation 
(SCAD) penalty is a well-known nonconvex function that 
promotes sparsity more tightly than the £\ norm does and has 
some oracular properties ED- For this penalty, we set the 
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parameter a to 3.7 as suggested in (47). To efficiently solve 
the optimization problem resulting from the SCAD penalty, 
we use the algorithm of [52] with parameter r = 10~ 3 and 
the external loop stopping threshold equal to 10 -4 . [^] Robust 
SLO (RSLO), a modification to the original SLO to handle noisy 
measurements, instead of a regularization parameter, needs the 
noise variance in the denoising step. To have a fair comparison, 
a w is passed to it. Other parameters of RSLO are the same 
as in the Experiment 2 except for sigma_min=cr l( ,/10. The 
method of Eg, go), which we refer to as IS'l-p, uses a 
generalized version of the soft thresholding operator, S a , 
defined as 

Sjv\ x ) = max(|a;| — a|x| p_1 ,0) sign(cc); 


otherwise, it is identical to the 1ST. We use two instances of 
this method with p = 0.5 and p = 0.1. The ILT algorithm 
is an extension of the reweighted t\ minimization in fl2j to 
the noisy case. In fact, it solves the following optimization 
problem 

m 

min A V'log(|a:;| + 0) + ||Ax - b|| 2 , 

X z —* 

i =1 

in which /3 is some small constant to ensure positivity of the 
argument of log(-), using iterative thresholding approach. 

For SCSA-FIT, SCSA-IT, 1ST, and FISTA, the regulariza¬ 
tion parameter is chosen according to the formula given in 
For ILT, SCAD, and IST-p, the regularization parameter 
is numerically tuned at a w = 10 -2 and is linearly scaled 
with the change of noise standard deviation. The stopping 
criterion for 1ST, FISTA, IST-p, and ILT is d = | x, 
Xj_i||/||xj_i|| < p, where X; and x^_i are the solutions at 
the ith and (i — l)th iterations, p = min(10 _3 A, 10~ 4 ), and 
A is the associated regularization parameter. For SCSA-FIT 
and SCSA-IT, e t and e 2 are set as suggested in Remark 2. 
Moreover, for 1ST, FISTA, ILT, and IST-p, // is always fixed 
to 0.99/(2A maa; (A T A)), while for SCSA-FIT and SCSA-IT, 
p is 0.99/(2A mox (A T A) + X/a). 

To obtain accurate and stable results, similar to ED, 
MSNR rec is used in Experiments 3 and 4 to compare the 
performance of the algorithms. In fact, with MSNR lec , we are 
comparing the ‘typical’ performance of these algorithms as 
most performance guarantees for the LASSO estimator and 
other nonconvex estimators hold with some probability (see 

e.g., (26), ED, ED). 

Experiment 3. Under the above conditions, for Gaussian 
distributed sparse vectors, s is changed from 2 to 160, and 
the MSNR rec and the averaged execution time (except for 
the oracle estimator) for all the algorithms over 500 runs are 
plotted in Fig. [D As clearly demonstrated in this figure, the 
(median) reconstruction SNR of the SCSA-FIT algorithm, for 
almost all values of s, is higher than others. Also, it has a 
near-oracle performance for a broader range of sparsity levels. 
So far as the computational load is concerned, SCSA-FIT 
needs at most (approximately) 3 times higher execution time 
in comparison to FISTA, the fastest algorithm for most of 
sparsity levels. However, the computational cost is lower than 


5 MATLAB code: https://sites.google.com/site/alainrakotomamonjy/ 




Fig. 5. Comparison of two variants of the SCSA algorithm in solving the 
noisy problem to 1ST, FISTA, SCAD (47], ILT FT], RSLO |[48), IST-p (49) , 
|50| , and the oracle estimator HD in terms of MSNR 1CC and execution time. 
The dimensions of the sensing matrix are 250 X 500. The sparse vectors are 
Gaussian distributed, and cr w = 10 4 Trials are repeated 500 times, and 
results are averaged over them. 


that of SCAD and RSLO, when s is larger than 70 and is 
smaller than 100, respectively. These two algorithms (SCAD 
and RSLO) are somehow the best competitors; however, they 
are not able to follow the oracular performance at the sparsity 
level that SCSA does. 

SCSA-IT and SCSA-FIT have a quite similar performance 
in terms of MSNR lec . However, as expected, the former spends 
considerably more time than the latter to output a solution. 
Particularly, SCSA-FIT is approximately 8 times faster than 
SCSA-IT, when sparsity level is equal to 140. 

Experiment 4. To show that the SCSA algorithm is effective 
for sparse vectors which are not Gaussian-like distributed, 
we repeat Experiment 3 with the Rademacher distributed 
signals. The Rademacher distributed sparse vectors do not 
exhibit the power-law decaying behavior (43) when nonzero 
components are sorted according to their magnitude. In ad¬ 
dition, some numerical simulations show that SLO, which 
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parallels the idea of SCSA, does not work well for this kind 
of distributions (see nuit-blanche.blogspot.com/2011/ll/post- 
peer-review-of-sl0.html). 

In many applications, it is more crucial to find the support 
set of the sparse vector accurately (53) . To numerically asses 
the performance of SCSA in recovering the true support, we 
also calculate the SRR in this experiment. Fig. [6] illustrates 
the results of the experiment for SCSA as well as all other 
algorithms in Experiment 3. As shown in this figure, SCSA 
and some other algorithms follow the oracle performance more 
closely than they did in Experiment 3. Furthermore, SCSA- 
FIT achieves the best performance in terms of MSNR lec and 
SRR, whereas its complexity is quite comparable to FISTA. 

Experiment 5. In this experiment, we examine and compare 
the accuracy of the SCSA-FIT algorithm in terms of MSE 
to FISTA, ILT, IST-0.5, SLO, and the oracle estimator as 
the noise variance changes. Under the same conditions as 
in Experiment 3 and for 3 sparsity levels (s = 10, 50,105), 
the noise standard deviation is changed from 10 -1 to 10 -3 , 
and the MSE is calculated. Results of this experiment are 
summarized in Fig. [7] As clearly depicted, for all values of s, 
SCSA-FIT is the closest one to the oracle estimator, and can 
follow it even for the large sparsity level of s = 105. 

VI. Conclusion 

For the problem of recovering sparse vectors from com¬ 
pressed measurements, we proposed to replace the £q norm 
with a family of concave functions that closely approximates 
sparsity. To solve the consequent nonconvex optimization 
problem, we exploited a continuation approach leading to start¬ 
ing from £i minimization, followed by successively solving 
accuracy-increasing approximations of the norm subject to 
the constraints. In the presence of noise in the measurements, 
we combined the continuation approach to iterative threshold¬ 
ing method of optimization and proposed a computationally 
inexpensive algorithm. This was obtained by deriving the 
closed-form solution for the program ( p~8| >. We also provided a 
choice for the regularization parameter in ( fl6| which is based 
on the regularization parameter for the LASSO estimator. The 
numerical simulations revealed that the proposed method con¬ 
siderably and consistently outperforms some of the common 
algorithms (including LASSO), especially when the sparsity 
level increases. 


Appendix A 

Before deriving the closed-form solution to ( fl8| for 
fa(\x\) = 1 — exp(— \x\/a), we need to introduce the Lambert 
W function and prove an inequality on this function. 

The Lambert W function, which has several applications 
in physics and applied and pure mathematics (particularly, 
combinatorics) [35j, is defined as the multivalued inverse of 
the function w i-> we w (35). More precisely, the Lambert W 
function, denoted by W(x), is given implicitly by 

W(x)e wM = x 

where x in general can be a complex number, but, herein, we 
only deal with real-valued W and assume x to be real. In 


this fashion, W(x) is single-valued for x > 0 and is double¬ 
valued for — - < x < 0 135]. To discriminate between the two 
branches for x £ [—^,0), we use the same notation as in 1351 
and denote the branch satisfying W(x) > — 1 and W(x) < — 1 
by Wo(x ) and W-i(x), respectively. 

Lemma 1: For any y £ [—A,0), Wo(y) + W-\{y) < —2. 

Proof: Simple algebraic manipulation shows that the 
parabola e~ 1 (\x 2 + x — i) is below the function xe x for 
x £ (—1,0) and above it for x < —1. Therefore, any line 
parallel to the x-axis with the y-intercept in the interval y = 
[— -, 0), intersects the parabola at equal or larger x coordinates 
than those corresponding to intersections with xe x . From the 
other side, the x coordinate of the intersection points of these 
lines with xe x gives Wq( y) and W-\{y). Consequently, for 
any y £ [— -T, 0), the sum Wo(y) + W_i(y) is less than or 
equal to the sum of the roots of e~ 1 (^x 2 + x — |) = y which 
is always equal to —2. This completes the proof. ■ 

To obtain the solution to GD- we first notice that since 
F a {-) is a separable function, the minimizer can be obtained 
element by element. Therefore, we focus on the one-variable 
optimization problem. To that end, let L(x,xo) = t^(x — 
xq) 2 + Xf a (\x\) denote the corresponding scalar version of 
© and let x* = argmin^. L(x, Xo). It is easy to show that 
L(x, —x o) = L(— x, xq); thus, from 


argmin{L(x, Xo)} for xo > 0 

X 

— argmin{L(—x,Xo)} for Xq < 0, 

X 


we get 

x* = sign(xo) argmin 

X 

= sign(xo) argmin 

X 


{^(si gn (zo)x - x 0 ) 2 + A/ ct (|x|)|, 
{^0 - M) 2 + A/ ct (|x|)}. 


Let y* = argmin x {^(x— |xo|) 2 + A/ CT (|x|)}. This definition 
implies that y* > 0, since if y* < 0, then y = —y* 
contradicts the optimality of y* as (y — |xo|) 2 < (y* — |xo|) 2 . 
Consequently, we have 

x* = sign(xo) argmin j ^-(x - |x 0 |) 2 + A/ CT (x) | x > oj. 


Denoting L(x,x 0 ) = ^(x - |x 0 |) 2 + A/ CT (x), putting 
f a {x) = 1 — exp(— x/a), and differentiating L(x,x 0 ) with 
respect to x, it can be obtained 



M) 


A x — |xo| °=—i°=oi 

-e ==> -e a 

a a 


The above equation admits two solutions 



fxi = aWo(—^e~^) + |x 0 | 

|x 2 = CTfU_i(-^e~' L ^ i ) + |x Q |. 

Since, for x < — ^, W(x) is not defined, we should have 
> _I or |x 0 | > <t(1 + ln(/rA) - 21n(cr)); 
otherwise, the minimizer of L[x,x o) lies at x = 0. 
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Fig. 7. The mean-squared errors of SCSA-FIT, FISTA, ILT, IST-0.5, RSLO, and the oracle estimator in recovering Gaussian distributed sparse vectors from 
noisy measurements as a function of the noise standard deviation. The dimensions of the sensing matrix are 250 X 500. Trials are repeated 500 times, and 
results are averaged over them, (a), (b), and (c) correspond to s = 10, 50,105, respectively. 


Substituting X\ and X 2 in L(x,x 0 ), letting 2 = — 
and using the definition W(x)e w ' x ' = x, we have 


L(xi,xo) — A + —( 2^0 ( z ) + Wo( 2 ))> 
L(x 2 , xq) = A + — ^-Wli(z) + W-i(z)^. 


Since /i(x) has an M -Lipschitz continuous gradient, we have 

m 

h(x k+ 1 ) < /i(x fe ) + (x fc+ i - x fc , V/i(x fc )) + ~ ||x fc+ i - x fc || 2 . 

(27) 

Moreover, V 2 0(x) >; —M'l for x > 0, implies that, for all 
x,y>o, 


From Lemma [I] we have 

(W 0 (z) + W. 1 {z) < -2 
\Wo(2) > -1 

=> 0 < W 0 (z) + 1 < -1 - W_i(z) 

=* + Wo(2) < ^W\{z) + W-!(z). 

This shows that x 2 cannot be a minimizer of L(x,x 0 ). In 
addition, it can be easily checked that L{ x,xq) foi' x > 0 is 
convex only when a 2 > /iA. Therefore, it can occur that the 
minimizer resides at the border (x = 0). It is very hard to 
analytically find the condition under which x = X\ or x = 0 
is the minimizer. However, one can easily compare the cost 
function at these points to find the minimizer. In summary, the 
one dimensional shrinkage operator can be characterized as 

fO |x 0 | > cr(l + ln(/rA/cr 2 )) 

Tp\{ x a) = < 0 L(xi,x 0 ) > L(0,x o ) 

l aWo(z) + Ixol otherwise, 

(25) 

, uA l x ol 

where z = — * . 

(T z 


Appendix B 
Proof of Theorem[2] 

For the sake of simplicity, let us define </>(x) = A <T F cr (x), 
where, without introducing any ambiguity, we omitted the 
subscript cr. Further, let g(x) = h(x) + </>(|x|) denote the cost 
function in The program (fTT) now is equal to 


x fe+ i = argmin|(x-x fc , Vft(x fe )) + —||x-x fc || 2 +^(|x|)|. 

(26) 


M' 

0(x) < 0(y) + (x - y, V</>(x)) + — ||y - x|| 2 . 

Substituting x and y with |xfc+i| and |x^|, respectively, we 
get 


0(|x fc +i|) < </>(|x fc |) + (|x fc+1 | - |x fc |, V0(|x fe+1 |)) 

M 1 

+ —|||x fc+1 |-|x fe ||| 2 . 

Using || |x| — |y||| < ||x — y||, the above inequality resorts to 


<K|x fe+ i|) < </>(|x fc |) + (|x fe+ i| - |x fe |, V^(|x fe+ i|)) 

M' 

+ — l|xfc+i — x fc || 2 . (28) 

From the other side, x/, : + 1 is a minimizer of ( |26| >, so we can 
write 


(x fc+ i-Xfc, Vft(x fc )) < 0(|x fc |)-^(|x fe+ i|)-^-||x fc+ i-x fc || 2 . 

(29) 

First-order concavity condition for (f> implies that e»(y) < 
<t>(x) + (y — x, V^>(x)). Replacing x and y with |xfc +1 | and 
|xfc|, respectively, we get 


<?K|xfc|)-<X|x fc+ i|) + <|x fc+ i| — |x fc |, V0(|x fc+ i|)) < 0. (30) 
Combining ( |27| and ( |28| > results in 
fl(Xfc +1 ) - ff(x fc ) 

< (Xfe+i - x fc , Vft(xfe)) + (|xfc_|_i| - |xfc|, V0(|x fc+ i|)) 

+ + A^OlIxfc+i — Xfc || 2 

(a) 

< </>(|x fe |) - 0(|Xfc_|_i |) + (|x fc+1 | - |Xfc|, V0(|x fc+1 |)) 

+ -(M + M' - )||xfc+i — Xfc|| 2 , 

2 /1 


( 31 ) 
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where (a) follows from ( |29| . Assume that 0 < /i < M + M , , 
then © rearranges to 

- M- M')||x fe+1 -x fe || 2 

2 fi 

< ff(xfc) - g(x k+ 1 ) + </>(|x fc |) - ^(|x fe+1 |) 
+(|xfe+i| - |x fc |, V0(|x fc+ i|) 

(b) 

< 5 (xfc)- 5 (x fc+ i), (32) 

where (b) follows from a. 

Now, we show that (g(xfc)} is a nonincreasing sequence. 
Similar to ( [27] ), one has 

h.(x) < ft(x fc ) + (x - Xfc, V/l(x fe )) + y ||x - x fe || 2 . 

As a result, if g € (0, /j), then /i(x) < if (X, Xfc) = ft(Xfc) + 
(x - Xfc, V/i(xfc)) + ^||x - Xfc|| 2 , or, equivalently, 

<7( x ) < ff(x, Xfc) + 0( |x| ) Vx. 

Replacing x with x fc+ i, we get g(x fe+ i) < ff(x fc+ i,Xfc) + 
(/>(|xfc + i|). Furthermore, Xfc +1 is a solution to ([26]), so 
ff(xfc + i,Xfc) + </>(|x fc+ i|) < ff(xfc,Xfc) + 0(|xfc|) = g(xfc). 

This inequality together with the previous one leads to 


5 (xfc+i) < g{x k ) Vfc > 0. 


Summing ( |32[ ) over k = 0, • • ■ , N, we get 
1 1 N 

-(- - ||x fc+1 - Xfc || 2 < g(x 0 ) - g(x N+ i). 

Z >— n 


Since <?(x 0 ) is hnite, we conclude that {xj,} is convergent. 
Next, we prove that {x^.} converges to a stationary point of 
©■ Suppose that {xfc} converges to x*. It can be verified 
that the cost function in ( |26| ) is strictly convex for // < 1/M; 
thus, the minimizer of ( [26] ) is unique. This fact together with 
155 Thm. 1.17 and Thm. 7.41] implies that when k —> oo, we 


have 


x = argmin • 


(x-x*, V/i(x*)) + 2-||x-x*|| 2 ■ 


<KI X I)}- 


First-order optimality condition of the above program leads to 

0 £ V/i(x*) + — (x — x*) + <9</(|x|) atx = x*, 

F 

proving that x* is a stationary point of (p~6]). ■ 


References 

[1] D. Gross, Y. K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum 
state tomography via compressed sensing,” Physical review letters, vol. 
105, no. 15, pp. 150401, 2010. 

[2] Y. Wiaux, L. Jacques, G. Puy, A. Scaife, and P. Vandergheynst, “Com¬ 
pressed sensing imaging techniques for radio interferometry,” Mon. Not. 
R. Astron. Soc., vol. 395, no. 3, pp. 1733-1742, 2009. 

[3] D. Koslicki, S. Foucart, and G. Rosen, “Quikr: a method for rapid 
reconstruction of bacterial communities via compressive sensing,” Bioin¬ 
formatics, vol. 29, no. 17, pp. 2096-2102, 2013. 

[4] Y. Erlich, A. Gordon, M. Brand, G. Hannon, and P. Mitra, “Compressed 
genotyping,” IEEE Trans. Inf. Theory, vol. 56, no. 2, pp. 706-723, 2010. 

[5] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application 
of compressed sensing for rapid MR imaging,” Magnetic resonance in 
medicine, vol. 58, no. 6, pp. 1182-1195, 2007. 


[6] Y. Eldar, “Compressed sensing of analog signals in shift-invariant 
spaces,” IEEE Trans. Signal Process., vol. 57, no. 8, pp. 2986-2997, 
2009. 

[7] I Bilik, “Spatial compressive sensing for direction-of-arrival estimation 
of multiple sources using dynamic sensor arrays,” IEEE Trans. Aerosp. 
Electron. Syst., vol. 47, no. 3, pp. 1754-1769, 2011. 

[8] M. Herman and T. Strohmer, “High-resolution radar via compressed 
sensing,” IEEE Trans. Signal Process., vol. 57, no. 6, pp. 2275-2284, 
2009. 

[9] D. Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse 
overcomplete representations in the presence of noise,” IEEE Trans. 
Inf. Theory, vol. 52, no. 1, pp. 6-18, 2006. 

[10] M. Babaie-Zadeh and C. Jutten, “On the stable recovery of the sparsest 
overcomplete representations in presence of noise,” IEEE Trans. Signal 
Process., vol. 58, no. 10, pp. 5396-5400, 2010. 

[11] M. Elad, Sparse and redundant representations: from theory to appli¬ 
cations in signal and image processing. Springer, 2010. 

[12] E.J. Candes, M.B. Wakin, and S.R Boyd, “Enhancing sparsity by 
reweighted I\ minimization,” Journal of Fourier analysis and appli¬ 
cations, vol. 14, no. 5-6, pp. 877-905, 2008. 

[13] R. Chartrand, “Exact reconstruction of sparse signals via nonconvex 
minimization,” IEEE Signal Process. Lett., vol. 14, no. 10, pp. 707- 
710, 2007. 

[14] H. Mohimani, M. Babaie-Zadeh, and C. Jutten, “A fast approach for 
overcomplete sparse decomposition based on smoothed norm,” IEEE 
Trans. Signal Process., vol. 57, no. 1, pp. 289-301, 2009. 

[15] S. Foucart and M. Lai, “Sparsest solutions of underdetermined linear 
systems via i q -minimization for 0 < q < 1,” Appl. Comput. Harmon. 
Anal., vol. 26, no. 3, pp. 395-407, 2009. 

[16] S. Rangan, “Generalized approximate message passing for estimation 
with random linear mixing,” in Proc. IEEE Int. Symp. Inform. Theory, 
2011, pp. 2168-2172. 

[17] RL. Combettes and J. Pesquet, “Proximal splitting methods in signal 
processing,” in Fixed-point algorithms for inverse problems in science 
and engineering, pp. 185-212. Springer, 2011. 

[18] E. J. Candes, “The restricted isometry property and its implications for 
compressed sensing,” Comptes Rendus Mathematique, vol. 346, no. 9, 
pp. 589-592, 2008. 

[19] R. Gribonval and M. Nielsen, “Highly sparse representations from 
dictionaries are unique and independent of the sparseness measure,” 
Appl. Comput. Harmon. Anal., vol. 22, pp. 335-355, 2007. 

[20] B.S. Kashin and V.N. Temlyakov, “A remark on compressed sensing,” 
Mathematical notes, vol. 82, no. 5-6, pp. 748-755, 2007. 

[21] E. Candes and T. Tao, “The dantzig selector: Statistical estimation when 
p is much larger than n,” The Annals of Statistics, pp. 2313-2351, 2007. 

[22] M. Malek-Mohammadi, M. Babaie-Zadeh, and M. Skoglund, “Iterative 
concave rank approximation for recovering low-rank matrices,” IEEE 
Trans. Signal Process., vol. 62, no. 20, pp. 5213-5226, 2014. 

[23] J. Trzasko and A. Manduca, “Highly undersampled magnetic resonance 
image reconstruction via homotopic-minimization,” IEEE Transactions 
on Medical imaging, vol. 28, no. 1, pp. 106-121, 2009. 

[24] R. Chartrand and V. Staneva, “Restricted isometry properties and 
nonconvex compressive sensing,” Inverse Problems, vol. 24, no. 3, 2008. 

[25] R. Wu and D. Chen, “The improved bounds of restricted isometry 
constant for recovery via £ p minimization,” IEEE Trans. Inf. Theory, 
vol. 59, 2013. 

[26] T. Zhang, “Analysis of multi-stage convex relaxation for sparse regu¬ 
larization,” J. Mach. Learn. Res., vol. 11, pp. 1081-1107, 2010. 

[27] C. Zhang and T. Zhang, “A general theory of concave regularization for 
high-dimensional sparse estimation problems,” Statistical Science, vol. 
27, no. 4, pp. 576-593, 2012. 

[28] R. Chartrand, “Nonconvex compressed sensing and error correction,” in 
IEEE Int. Conf. Acoust. Speech Signal Process., 2007, pp. 889-892. 

[29] Y. Shen, J. Fang, and H. Li, “Exact reconstruction analysis of log-sum 
minimization for compressed sensing,” IEEE Signal Process. Lett., vol. 
20, no. 12, pp. 1223-1226, 2013. 

[30] D. Needed, “Noisy signal recovery via iterative reweighted 11- 
minimization,” in Asilomar Conference on Signals, Systems, and 
Computers, 2009, pp. 113-117. 

[31] M. Malek-Mohammadi, M. Babaie-Zadeh, A. Amini, and C. Jutten, 
“Recovery of low-rank matrices under affine constraints via a smoothed 
rank function,” IEEE Trans. Signal Process., vol. 62, no. 4, pp. 981-992, 
2014. 

[32] D. Hunter and K. Lange, “A tutorial on MM algorithms,” The American 
Statistician, vol. 58, no. 1, pp. 30-37, 2004. 

[33] P. Bickel, Y. Ritov, and A. Tsybakov, “Simultaneous analysis of lasso 
and dantzig selector,” The Annals of Statistics, pp. 1705-1732, 2009. 




14 


[34] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding 
algorithm for linear inverse problems,” SIAM Journal on Imaging 
Sciences, vol. 2, no. 1, pp. 183-202, 2009. 

[35] R. M. Corless, G. H. Gonnet, D. EG Hare, D. J. Jeffrey, and D. E. Knuth, 
“On the Lambert W function,” Advances in Computational mathematics, 
vol. 5, no. 1, pp. 329-359, 1996. 

[36] F. Clarke, Optimization and nonsmooth analysis, SIAM, Philadelphia, 
1990. 

[37] T. Blumensath and M. E. Davies, “Iterative hard thresholding for 
compressed sensing,” Appl. Comput. Harmon. Anal., vol. 27, no. 3, 
pp. 265-274, 2009. 

[38] T. Zhang, “Multi-stage convex relaxation for feature selection,” 
Bernoulli, vol. 19, no. 5B, pp. 2277-2293, 2013. 

[39] Zh. Wang, H. Liu, and T. Zhang, “Optimal computational and statistical 
rates of convergence for sparse nonconvex learning problems,” Annals 
of statistics, vol. 42, no. 6, pp. 2164, 2014. 

[40] C. Zhang, “Nearly unbiased variable selection under minimax concave 
penalty,” The Annals of Statistics, vol. 38, no. 2, pp. 894-942, 2010. 

[41] J. Liu, J. Jin, and Y. Gu, “Relation between exact and robust recovery 
for f-minimization: A topological viewpoint,” in Proc. IEEE Int. Symp. 
Inform. Theory, 2013, pp. 859-863. 

[42] Y. Zhang, “Theory of compressive sensing via i\ minimization: A non- 
RIP analysis and extensions,” Technical report tr08-11 revised, Dept, of 
Computational and Applied Mathematics, Rice University. 

[43] E. Candes and J. Romberg, li i\ -magic: Recovery of sparse signals 
via convex programming,” http://users.ece.gatech.edu/~justin/llmagic/, 
2005. 

[44] A. Belloni, V. Chemozhukov, and L. Wang, “Square-root lasso: pivotal 
recovery of sparse signals via conic programming,” Biometrika, vol. 98, 
no. 4, pp. 791-806, 2011. 

[45] Z. Ben-Haim, Y. Eldar, and M. Elad, “Coherence-based performance 
guarantees for estimating a sparse vector under random noise,” IEEE 
Trans. Signal Process., vol. 58, no. 10, pp. 5030-5043, 2010. 

[46] R. Chartrand and W. Yin, “Iteratively reweighted algorithms for 
compressive sensing,” in IEEE Int. Conf. Acoust. Speech Signal Process., 
2008, pp. 3869-3872. 

[47] J. Fan and R. Li, “Variable selection via nonconcave penalized 
likelihood and its oracle properties,” Journal of the American statistical 
Association, vol. 96, no. 456, pp. 1348-1360, 2001. 

[48] A. Eftekhari, M. Babaie-Zadeh, C. Jutten, and H. Abrishami Moghad- 
dam, “Robust-SLO for stable sparse representation in noisy settings,” in 
IEEE Int. Conf. Acoust. Speech Signal Process., 2009, pp. 3433-3436. 

[49] R. Chartrand, “Nonconvex splitting for regularized low-rank + sparse 
decomposition,” IEEE Trans. Signal Process., vol. 60, no. 11, pp. 5810— 
5819, 2012. 

[50] R. Chartrand, “Fast algorithms for nonconvex compressive sensing: MRI 
reconstruction from very few data,” in IEEE International Symposium 
on Biomedical Imaging, 2009, pp. 262-265. 

[51] D. Malioutov and A. Aravkin, “Iterative log thresholding,” in IEEE Int. 
Conf. Acoust. Speech Signal Process., 2014. 

[52] G. Gasso, A. Rakotomamonjy, and S. Canu, “Recovering sparse signals 
with a certain family of nonconvex penalties and dc programming,” 
IEEE Trans. Signal Process., vol. 57, no. 12, pp. 4686-4698, 2009. 

[53] A. Miller, Subset selection in regression, CRC Press, 2002. 

[54] W. Ortega, J. Rheinboldt, Iterative solution of nonlinear equations in 
several variables, vol. 30, SIAM, 2000. 

[55] R. Rockafellar and R. Wets, Variational analysis. Springer, 1998. 




Fig. 6. Comparison of two variants of the SCSA algorithm in solving the 
noisy problem with settings similar to those of Fig. Bl except that the sparse 
vectors are Rademacher distributed. Trials are repeated 500 times, and results 
are averaged over them. Since SCSA-FIT and SCSA-IT as well as SCAD and 
IST-0.5 have very similar performances in terms of MSNR rec , their traces are 
almost coincident in the top plot. 






























































































